The Parallel Meaning Bank: Towards a Multilingual Corpus of Translations Annotated with Compositional Meaning Representations
نویسندگان
چکیده
The Parallel Meaning Bank is a corpus of translations annotated with shared, formal meaning representations comprising over 11 million words divided over four languages (English, German, Italian, and Dutch). Our approach is based on cross-lingual projection: automatically produced (and manually corrected) semantic annotations for English sentences are mapped onto their word-aligned translations, assuming that the translations are meaning-preserving. The semantic annotation consists of five main steps: (i) segmentation of the text in sentences and lexical items; (ii) syntactic parsing with Combinatory Categorial Grammar; (iii) universal semantic tagging; (iv) symbolization; and (v) compositional semantic analysis based on Discourse Representation Theory. These steps are performed using statistical models trained in a semisupervised manner. The employed annotation models are all language-neutral. Our first results are promising.
منابع مشابه
Evaluating Scoped Meaning Representations
Semantic parsing offers many opportunities to improve natural language understanding. We present a semantically annotated parallel corpus for English, German, Italian, and Dutch where sentences are aligned with scoped meaning representations in order to capture the semantics of negation, modals, quantification, and presupposition triggers. The semantic formalism is based on Discourse Representa...
متن کاملOptimality in Analysis, Generation, and Learning: Towards a Robust Computational Architecture for Corpus-based Studies of Syntax
This paper describes a computational architecture for accessing implicit information about the grammar of the languages included in a parallel corpus and exploiting it in an Optimality Theorystyle learning approach. Previous work on OT learning presupposes the existence of training data in which the underlying input has been annotated. This is an idealization that does not reflect the natural l...
متن کاملOptimality in Analysis, Generation and Learning: towards a Robust Computational Architecture for Corpus-based Studies of Syntax 1
This paper describes a computational architecture for accessing implicit information about the grammar of the languages included in a parallel corpus and exploiting it in an Optimality Theory-style learning approach. Previous work on OT learning presupposes the existence of training data in which the underlying input has been annotated. This is an idealization that does not reflect the natural ...
متن کاملMultilingual Distributed Representations without Word Alignment
Distributed representations of meaning are a natural way to encode covariance relationships between words and phrases in NLP. By overcoming data sparsity problems, as well as providing information about semantic relatedness which is not available in discrete representations, distributed representations have proven useful in many NLP tasks. Recent work has shown how compositional semantic repres...
متن کاملDistributed representations for compositional semantics
The mathematical representation of semantics is a key issue for Natural Language Processing (NLP). A lot of research has been devoted to finding ways of representing the semantics of individual words in vector spaces. Distributional approaches—meaning distributed representations that exploit co-occurrence statistics of large corpora—have proved popular and successful across a number of tasks. H...
متن کامل